Conjugate Prior

In Bayesian probability theory, if the posterior distributions p(θ|x) are in the same family as the prior probability distribution p(θ), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function.

To see a conclusion of conjugate distributions of all frequently-used likelyhood functions, their posterior hyperparameters and interpretation of hyperparameters, check this page.

To see why we tend to choose the conjugate prior as the prior, consider the proof of the conjugate prior:

p(θx)=p(θ,x)p(x)=p(θ)p(xθ)p(x)(conditional prob.)=p(θ)p(xθ)p(θ,x)dθ(marginal distribution)=p(θ)p(xθ)p(θ)p(xθ)dθ(conditional prob.)

Recall that prior distribution is the probability distribution over probabilites. So p(θ) is the probability of seeing θ (the model parameter) in the prior distribution. If prior distribution is not the conjugate prior, the form of posterior distribution (can be told from the equation above) will be very complex and computationally intractable.

reference

Conjugate Prior: https://en.wikipedia.org/wiki/Conjugate_prior